Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PRIM][IR]Complete IR vjp code gen for more vjp code gen #56798

Merged

Conversation

Charles-hit
Copy link
Contributor

@Charles-hit Charles-hit commented Aug 30, 2023

PR types

New features

PR changes

Others

Description

Pcard-66975
背景:
在PR PR56512基础上进行反向算子vjp生成扩量工作,高优支持GPT/LLama依赖的26个高优算子vjp.
PR改动:

  • 完善IR下算子vjp代码生成,支持compat yaml信息解析
  • vjp支持可变attribute
    新IR下可变attribute会变成输入形式,但是组合算子组网时如果需要对attribute值进行判断那么就要求attribute是一个常量。因此,vjp接口可变attribute统一为输入形式,但是在内部针对组合非组合模式进行变换,组合模式如果是可变attribute,就会根据输入找到真正的attribute值,非组合模式直接透传即可。
    sum算子举例,生成的vjp代码如下:
std::vector<std::vector<paddle::Tensor>> sum_vjp(const Tensor& x, const Tensor& out_grad, const Tensor& axis_, bool keepdim, bool reduce_all, const std::vector<std::vector<bool>>& stop_gradients) {
  std::vector<std::vector<paddle::Tensor>> vjp_res;
  for (auto arg: stop_gradients) {
    vjp_res.push_back(std::vector<paddle::Tensor>(arg.size()));
  }
  if (paddle::prim::StaticCompositeContext::Instance().IsBwdPrimEnabled()) {
    paddle::Tensor* x_grad = !stop_gradients[0][0] ? &vjp_res[0][0] : nullptr; 
    auto* axis_define_op = std::static_pointer_cast<primitive::LazyTensor>(axis_.impl())->getValue().dyn_cast<ir::OpResult>().GetDefiningOp();
    if(axis_define_op->name() != "pd.full_int_array"){
      PADDLE_THROW(platform::errors::Unimplemented(
          "We don't support dynamic tensors attribute axis for sum_grad composite "
          "for now. "));
    }
    auto axis = axis_define_op->attribute("value").dyn_cast<paddle::dialect::IntArrayAttribute>().data();

    details::sum_grad<LazyTensor>(x, out_grad, axis, keepdim, reduce_all, x_grad);
  } else {
    auto op_res = backend::sum_grad<LazyTensor>(x, out_grad, axis_, keepdim, reduce_all);
    vjp_res[0][0] = !stop_gradients[0][0] ? op_res : vjp_res[0][0];
  }
  return vjp_res;

}

sum组网api在微分层会生成可变attribute为输入和attribute两种形式:

Tensor sum<LazyTensor>(const Tensor& x, const Tensor& axis_, DataType dtype, bool keepdim) {
  ir::OpResult x_res = std::static_pointer_cast<LazyTensor>(x.impl())->getValue().dyn_cast<ir::OpResult>();
  ir::OpResult axis_res = std::static_pointer_cast<LazyTensor>(axis_.impl())->getValue().dyn_cast<ir::OpResult>();
  auto op_res = paddle::dialect::sum(x_res, axis_res, dtype, keepdim);
  Tensor out(std::make_shared<LazyTensor>(op_res));
  return out;
}
Tensor sum<LazyTensor>(const Tensor& x, const IntArray& axis, DataType dtype, bool keepdim) {
  ir::OpResult x_res = std::static_pointer_cast<LazyTensor>(x.impl())->getValue().dyn_cast<ir::OpResult>();
  auto op_res = paddle::dialect::sum(x_res, axis.GetData(), dtype, keepdim);
  Tensor out(std::make_shared<LazyTensor>(op_res));
  return out;
}
  • 适配intermediate输出,在组网api中不输出
    reshape举例,xshape作为intermediate输出将不会输出
Tensor reshape<LazyTensor>(const Tensor& x, const Tensor& shape_) {
  ir::OpResult x_res = std::static_pointer_cast<LazyTensor>(x.impl())->getValue().dyn_cast<ir::OpResult>();
  ir::OpResult shape_res = std::static_pointer_cast<LazyTensor>(shape_.impl())->getValue().dyn_cast<ir::OpResult>();
  auto op_res = paddle::dialect::reshape(x_res, shape_res);
  Tensor out(std::make_shared<LazyTensor>(op_res));
  return out;
}
  • vjp中组网api适配多输入多输出算子
    多输入多输出:
std::vector<Tensor> concat_grad<LazyTensor>(const std::vector<Tensor>& x, const Tensor& out_grad, const Scalar& axis) {
  std::vector<ir::OpResult> x_res(x.size());
  std::transform(x.begin(), x.end(), x_res.begin(), [](const Tensor& t) {
    return std::static_pointer_cast<LazyTensor>(t.impl())->getValue().dyn_cast<ir::OpResult>();
  });
  ir::OpResult out_grad_res = std::static_pointer_cast<LazyTensor>(out_grad.impl())->getValue().dyn_cast<ir::OpResult>();
  auto op_res = paddle::dialect::concat_grad(x_res, out_grad_res, axis.to<int>());
  std::vector<Tensor> x_grad(op_res.size());
  std::transform(op_res.begin(), op_res.end(), x_grad.begin(), [](const ir::OpResult& res) {
    return Tensor(std::make_shared<LazyTensor>(res));
  });
  return x_grad;
}

接下来要完成的工作:

  • GPT/LLama依赖高优算子vjp全量生成
  • optional支持以及反向空梯度语义表示

@paddle-bot
Copy link

paddle-bot bot commented Aug 30, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@Charles-hit Charles-hit force-pushed the support_mutable_attributes branch 4 times, most recently from d5f618b to 8bb28a3 Compare August 31, 2023 01:35
def is_mutable_attribute(attr):
return (
attr['typename'] in ['Scalar', 'IntArray']
and attr['support_tensor'] is True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

属于这两类'Scalar', 'IntArray',但是没有support_tensor属性的算子怎样处理的?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

会变成常量处理

@Charles-hit Charles-hit force-pushed the support_mutable_attributes branch 2 times, most recently from 8cc333a to 033acf4 Compare August 31, 2023 16:34
@cxxly
Copy link
Contributor

cxxly commented Sep 1, 2023

使用有意义的PR描述,比如该PR通过完善codegen逻辑扩量支持GPT/LLama依赖的XX个高优算子
详细描述中,可以介绍,具体完善了哪些功能

);
{% elif outputs|length == 1 %}
return Tensor(std::make_shared<LazyTensor>(op_res));
return std::make_tuple({% for i in range(outputs|length) %}{{outputs[i].name}}{%- if i!=outputs|length - 1 -%}, {% endif %}{% endfor %});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

拆分成多个宏函数,原则上一个函数长度建议控制在50行以内,一个模块的长度不超过一屏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这儿在下个pr统一修改,先不阻塞他人工作合入一版。

{% endif %}
{% endif %}
{% endfor %}
{% endmacro %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

编译器常量推断(常量折叠)是一种比较常见技术,可以单独封装成一个可读性高的函数来表明部分代码功能

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这儿会在下个pr统一修改

@Charles-hit Charles-hit changed the title [PRIM][IR]Complete IR vjp code gen [PRIM][IR]Complete IR vjp code gen for more vjp code gen Sep 1, 2023
Copy link
Contributor

@cxxly cxxly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Aurelius84 Aurelius84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

{% endif %}
{% endif %}
{% endfor %}
{% endmacro %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

@heavyrain-lzy heavyrain-lzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for tests_utils.py

for attr in attrs:
if (
attr['typename'] in ['Scalar', 'IntArray']
and attr['support_tensor'] is True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Scalar的类型不仅仅是Scalar还可能是Scalar(int) Scalar(int64_t)等,这个函数可以借助tests_utils.py中的
def is_scalar(s):
    return re.match(r"Scalar(\(\w+\))*", s) is not None


def is_intarray(s):
    return s == 'IntArray'

进行判断。
2. 新IR下可变attribute是否需要对:

attr['tensor_name'] is not None or attr['tensors_name'] is not None

进行判断。

Copy link
Contributor Author

@Charles-hit Charles-hit Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢提醒,第一个点我理解有明确类型数据类型应该不需要修改了,第二点已经在gen.py中进行处理了。

@Charles-hit Charles-hit merged commit 4abea95 into PaddlePaddle:develop Sep 1, 2023
BeingGod pushed a commit to BeingGod/Paddle that referenced this pull request Sep 9, 2023
…e#56798)

* Fix attr type error like concat axis

* Fix None input error

* Fix intermediate output

* support vjp code gen

---------

Co-authored-by: 0x45f <wangzhen45@baidu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants